NNSVS vs. Sinsy

Open In Colab

This notebooks show audio samples for comparisions of NNSVS and Sinsy.

Models

  • sinsy_f00001j: Sinsy’s HMM-based SVS system

  • sinsy_f00001j_dnn_beta4: Sinsy’s DNN-based SVS system.

  • nnsvs_yoko: NNSVS-based system trained on the publicly available version of nit-song070 database. Specifically, we used 29 songs (out of 31) for training. Note that pre-trained models based on kiritan_singing database (49 songs for trainnig) were used to initialize model parameters. Therefore, the system in fact used 49 + 29 songs in total for training.

Notes

  • Trainig data: Accorindg to the latest sinsy’s paper, the authors seems to use 60 songs (out of 70) for training. Since the publically available version of the nit-song070 dataset only contains a subset of the full dataset, we are unable to train NNSVS models with the same training data condition.

  • Date: Sinsy samples were generated at 2022/03/27 using https://www.sinsy.jp/.

Preparation

[1]:
%%capture
try:
    import nnsvs
except ImportError:
    ! pip install git+https://github.com/r9y9/nnsvs
[2]:
%pylab inline
%load_ext autoreload
%autoreload
import IPython
from IPython.display import Audio
from scipy.io import wavfile
import pysinsy
from nnmnkwii.io import hts
from urllib.request import urlretrieve
import tempfile
%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib
[3]:
from nnsvs.pretrained import create_svs_engine
import nnsvs
[4]:
def svs_display(model, xml_file):
    engine = create_svs_engine(model)
    contexts = pysinsy.extract_fullcontext(xml_file)
    labels = hts.HTSLabelFile.create_from_contexts(contexts)
    wav, sr = engine.svs(labels)
    IPython.display.display(Audio(wav, rate=sr))

def wav_display(url):
    with tempfile.NamedTemporaryFile(suffix=".wav") as f:
        urlretrieve(url, f.name)
        sr, wav = wavfile.read(f.name)
    IPython.display.display(Audio(wav, rate=sr))

Sample 1: げんこつ山のタヌキさん

[5]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/qq6w7bbcc5ikcdf/sinsy_song070_f00001j_063.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("song070_f00001_063"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/4epe08wqebyuh4g/sinsy_song070_f00001j_dnn_beta4_063.wav?dl=1")
sinsy_f00001j
nnsvs_yoko
sinsy_f00001j_dnn_beta4

Sample 2: Get Over

[6]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/kam9kju97umi6li/sinsy_f00001j_get_over.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("get_over"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/7st0acvguvbdoaj/sinsy_f00001j_dnn_beta4_get_over.wav?dl=1")
sinsy_f00001j
nnsvs_yoko
sinsy_f00001j_dnn_beta4

Sample 3: 雪

[7]:
print("sinsy_f00001j")
wav_display("https://www.dropbox.com/s/ho5xgkil8r3f3ed/sinsy_yuki_f00001j.wav?dl=1")
print("nnsvs_yoko")
svs_display("r9y9/yoko_latest", nnsvs.util.example_xml_file("yuki"))
print("sinsy_f00001j_dnn_beta4")
wav_display("https://www.dropbox.com/s/jo2ool0nytzxln2/sinsy_yuki_f00001j_dnn_beta4.wav?dl=1")
sinsy_f00001j
nnsvs_yoko
sinsy_f00001j_dnn_beta4